Specimen topography reconstruction

ABSTRACT

This method removes high frequency noise from shape data, significantly improves metrology system ( 10 ) performance and provides very compact representation of the shape. This model-based method for wafer shape reconstruction from data measured by a dimensional metrology system ( 10 ) is best accomplished using the set of Zernike polynomials (matrix L). The method is based on decomposition of the wafer shape over the complete set of the spatial function. A weighted least squares fit is used to provide the best linear estimates of the decomposition coefficients (Bnk). The method is operable with data that is not taken at regular data points and generates a reduced data field of Zernike coefficients compared to the large size of the original data field.

CROSS REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority of U.S. Provisional PatentApplication No. 60/174,082 Entitled: SPECIMEN TOPOGRAPHY RECONSTRUCTIONfiled Dec. 30, 1999, incorporated herein by reference.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

[0002] N/A

BACKGROUND OF THE INVENTION

[0003] Wafer shape is a geometric characteristic of a semiconductorwafer, which describes the position of the wafer's central plane surfacein space. The bow, warp and other shape related parameters ofsemiconductor wafers must be within precise tolerances in order forwafers to be usable. The precision of a dimensional metrology(measurement) system must be tight enough to provide the requiredcontrol over the quality of manufactured wafers.

[0004] The high accuracy metrology of test specimens, such as thetopographic measurement of bow, warp, flatness, thickness etc. of suchobjects as semiconductor wafers, magnetic disks and the like, is impededby the presence of noise in the output data. Depending on the inherentproperties of the instrument and the environment, the data may have anoise content that displays larger peak to peak magnitude that theactual dimensions being measured. It is difficult to remove all sourcesof wafer vibration in a sensor based dimensional metrology system whenthe wafer moves between the sensors. The natural frequency of wafervibration is of the order of tens to a few hundred Hertz, depending onwafer size and loading conditions, and the observed pattern of vibrationhas a spatial wavelength less than a few mm. If this noise is notremoved, it directly affects the repeatability and reproducibility ofthe measurements of the system.

[0005] The measurements for wafer shape are typically taken at aplurality of points over the specimen surface. The positions of thosepoints are not rigorously controlled between specimens. Therefore, thesame data point may not be from the same exact location on each specimentested by a particular metrology unit. This limits the usefulness ofsuch noise elimination techniques as correlation analysis. Similarly thedesire to process data for noise reduction from arbitrary shapes,particularly circular, reduces the attractiveness of high speed datasystems such as Fast Fourier Transforms. Wafer shape is mostly a lowspatial frequency characteristic. This makes it possible to removevibration noise by using a low pass 2D spatial filter.

[0006] Convolution-based filters require a regular, evenly spaced dataset that uses a priori information about the analytical continuation ofthe wafer shape beyond the wafer boundaries, e.g. the periodic behaviorof the wafer shape. Because of this requirement for regular data and apriori information, conventional filters such as convolution techniquesare not applicable for wafer shape vibration-noise removal. Fast Fouriertransforms are an alternate high speed data processing method, but theyare not well adapted to noise reduction processing from arbitrarynon-rectilinear shapes, particularly circular shapes.

[0007] An analytical method for removing the noise content frommetrology measurements of wafer specimens that accommodates thevariability of data points is needed.

BRIEF SUMMARY OF THE INVENTION

[0008] This invention has application for wafer shape metrology systemswhere the wafer moves between two-dimensional sensors that scan it andthe scan pattern is not necessarily evenly spaced in Cartesianco-ordinates.

[0009] The invention provides a method to reduce the noise inmetrological data from a specimen's topography. The model-based methodallows wafer shape reconstruction from data measured by a dimensionalmetrology system by quantifying the noise in the measurements. Themethod is based on decomposition of the wafer shape over the full set ofspatial measurements. A weighted least squares fit. provides the bestlinear estimate of the decomposition coefficients for a particular pieceof test equipment. The fact that wafer's noise is predominantly a lowfrequency spatial object guarantees fast convergence. An importantadvantage of the use of the least squares fit method is the fact that aregular grid of data points is not required to calculate thecoefficients. Zernike polynomials are preferred for wafer shapereconstruction, as they operate with data that is not taken at regulardata points and that represents circular objects.

[0010] At least one set of raw data from a measurement is analyzed toobtain a characterizing matrix of the Zernike type for that particularinstrument. A least squares fit on the single value decomposition of thedata is used to initially calculate the matrix characterizing theinstrument. Thereafter, this matrix does not need to be recalculatedunless factors change the errors in the measurement instrumentation.

[0011] Data characterizing the topography of a specimen, in the form ofZernike coefficients, can be sent with specimens or telecommunicatedanywhere. Because the Zernike coefficients are a completecharacterization and are efficient in using minimum data space, thismethod significantly improves metrology system performance by removinghigh frequency noise from the shape data and providing a very compactrepresentation of the shape.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWING

[0012] These and other objects, aspects and advantages of the presentinvention will become clear as the invention becomes better understoodby referring to the following solely exemplary and non-limiting detaileddescription of the method thereof and to the drawings, wherein.

[0013]FIG. 1 shows apparatus for measuring the topography of a specimen,in particular of a semiconductor wafer;

[0014]FIG. 2 shows a visual scale image of specimen topography withnoise;

[0015]FIG. 3 shows a visual scale image of specimen topographycharacteristic of the measurement apparatus;

[0016]FIG. 4 shows a visual scale image of specimen topography withnoise reduction; and

[0017]FIG. 5 shows a graph of tighter consistency of measurement afteruse of the invention.

DETAILED DESCRIPTION OF THE INVENTION

[0018] According to the present invention, and as shown in FIG. 1, ametrology system 10 receives a cassette 12 of semiconductor wafers 14for testing of surface properties, such as those noted above. The wafers14 are measured in a physical test apparatus 16, such as any of the ADECorporation's well-known measurement stations, the WAFERCHECK™ systemsbeing one such.

[0019] The physical test apparatus 16 outputs data to a processor 20 ona communications line 18. The data is typically a vector of measuredwafer artifacts, such as flatness height, developed during a spiral scanof the wafer. The present invention operates to eliminate or reduce thenoise from the wafer measurement system.

[0020] The raw noisy data is typically stored in a memory area 22 whereits vector can be represented as W(ρ, φ), where ρ is the normalized(r/radius) radial location of each measurement point, and θ is the anglein polar coordinates of the measurement point. The processor 20 performsa transform on this data using a previously calculated matrix, L, whichrepresents the noise characteristic of the measurement station 10. Thistransform outputs the coefficients of a function that gives the noisereduced topography of the specimen at each desired point. The specimenshape is normalized for noise data alone. The outputs are fed to aninput/output interface 30 that may transmit the output to a remotelocation. The coefficients may also be transmitted from the I/O unit 30to remote locations, or sent along with the specimen on a data carrier,the Internet or any other form as desired.

[0021] The previously calculated matrix, L, is advantageouslyrepresented as a Zernike polynomial. Zernike polynomials were introduced[F.Zernike, Physica, 1(1934), 689] and used to describe aberration anddiffraction in the theoretical and applied optics. These 2D polynomialsrepresent a complete orthogonal set of functions over the unit circle.Any differentiable function defined over the finite radius circle can berepresented as a linear combination of Zernike polynomials. There is noneed for a priori information as is the case for convolution techniques.Zernike polynomials are invariant relative to rotation of the coordinatesystem around an axis normal to the wafer plane. This invariance aids inshape data analysis, especially for data having orientationdependencies. The spectrum of Zernike decomposition coefficients hasanalogues to power spectral density in Fourier space. The invariancecharacter is that it loses spatial significance as a Fourier seriesloses time relationships.

[0022] The transform from shape W(r,θ) onto Zernike functional space (n,k) is expressed as: $\begin{matrix}{{{W\left( {r,\theta} \right)} = {\sum\limits_{n,k}\quad {B_{nk}{R_{n}^{k}(\rho)}{\exp \left( {{- }\quad k\quad \theta} \right)}}}},} & (1)\end{matrix}$

[0023] where, (r,θ) are data point polar coordinates,

[0024] ρ=r/wafer radius,

[0025] B_(nk) is the decomposition coefficient, and $\begin{matrix}{R_{n}^{k} = {\sum\limits_{s = 0}^{{({n - k})}/2}\quad {\left( {- 1} \right)^{s}{{\left( {n - s} \right)!}/{\left( {{s!}{\left( {{\left( {n + k} \right)/2} - s} \right)!}\left( {{\left( {n - k} \right)/2} - s} \right)} \right)!}}\rho^{({n - {2s}})}}}} & (2)\end{matrix}$

[0026] Where n and k and s are arbitrary variables of synthetic space.

[0027] The decomposition coefficients Bnk are calculated from the systemof linear equations (1). This system is over determined, in that thenumber of equations (One for each data point) is two orders of magnitudegreater that the number of coefficients B_(nk) (unknowns).

[0028] The B_(nk) decomposition coefficients can be kept to a smallnumber, typically around 100 by selection of the limits on n, and on k,which varies from −n to +n integrally. The data range typically is largeenough to accurately sample the noise being cancelled, while smallenough to be manageable. The spacial filtering is a result of the limiton the range for s, which is allowed to grow in the range 0 . . . n. Forwafer metrology, an n of about 10 filters out the noise componentdescribed above for the ADE Corporation equipment.

[0029] The system of equations (1) is solved using the weighted leastsquares fit, because weighted least squares, fit overcomes measurementerrors in the input data. Weightings are determined based on thereliability of data; when data is more reliable (exhibits smallervariances), it is weighted more heavily. The calculated covariancematrix is used to assign weight to data points. Using the statisticalweightings, improves the fit of the output.

[0030] According to Strang, [Strang, G., Introduction to AppliedMathematics, Wellesley-Cambridge, 1986, p. 398.] the best unbiased(without preconditions) solution of the system (1) can be written as

B=(A ^(T)Σ⁻¹ A)⁻¹ A ^(T)Σ⁻¹ W,  (3)

[0031] where,

[0032] B—vector of decomposition coefficients,

[0033] A—matrix of {{R_(n) ^(k)(ρ_(j))exp(−ikθ_(j))}, j=1,2, . ..,number of measured points.

[0034] T—stands for transpose matrix.

[0035] Σ⁻¹—inverse of the covariance matrix Σ.

[0036] W—vector of measured values W(ρ_(j),θ_(j)).

[0037] The matrix L=(A^(T)Σ⁻¹A)⁻¹A^(T)Σ⁻¹) in front of W in solution (3)does not depend on actual measured values. Therefore, for a given scanpattern it can be pre-calculated and stored in a computer memory. Matrixvalue L will need to be recalculated each time the error function of theinstrument changes. The matrix value L is calculated using the SingleValue Decomposition (SVD) method [Forsythe,G. E., Moler,C. B., ComputerSolution of Linear Algebraic Systems, Prentice-Hall, 1971]. SVD does notrequire evenly sampled data points.

[0038] Once L is determined, only one matrix multiplication is requiredto calculated the unknowns in B. This procedure, when implemented, is asfast as a Fast Fourier Transform but avoids the 2D Fast FourierTransform's difficulties dealing with the wafers' circular boundariesand any non-Cartesian scan pattern.

[0039] The processor 20 of FIG. 1 can output either the Zernikecoefficients of the actual wafer, or the output can be in the form ofW(r,θ) that gives the noise reduced topography of the specimen or waferat any desired point. W(r,θ) can be calculated from the Zernikecoefficients.

[0040] The suggested method was first implemented and verified in asimulated environment. ANSYS finite element analysis software was usedto generate wafer vibration modes and natural frequencies for a numberof wafer diameters and loading conditions. Then having the wafer shapemeasurement process affected by vibration was modeled and simulated in aMatlab. Generated shape data were processed according to the suggestedmethod yielding simulated shape and calibration information.

[0041] Later, shape reconstruction was applied to real world wafer shapedata across an ADE platform to confirm the utility of the method. FIGS.2-5 illustrate the benefit of the present invention in removing noisefrom the scan of a specimen, shown in topographic presentation in FIG.2. In FIG. 2, both the noise inherent in the measurement instrument andthe irregularities of the wafer are integrated. The wafer appears tohave ridges of high points 200 that radiate from the center of thewafer, some areas of nominal height 210, and diffuse regions of highspots 230. It would be difficult to plan a smoothing operation on thewafer shown.

[0042] In FIG. 3, the noise of the measurement instrument is presented.Here, it is evident that, from a nominal height center 300, arced radialbands 310 extend to the circumference of the specimen 340. Some arcs 310are compact, while others 320 have a more diffuse aspect. Thistopographic chart illustrates how the instrument vibrates the specimenin the process of rotating it for scanning. Comparing the scales forFIGS. 2 and 3, shows that the magnitude of the vibration noise is lessthan the overall irregularity in the specimen. FIG. 4 shows the samespecimen's topography with noise of FIG. 3 removed. Now it can be seenthat the specimen has 3 high spots 400. Two of the high spots 400exhibit a sharp gradient 410 between the nominal height of the specimen430 and the high spot 400. The third high spot 400 exhibits a moregradual gradient 420 between the nominal height 430 and the high spot.Further processing of this topography can be planned.

[0043]FIG. 5 illustrates the repeatability of the noise reduced data.For the ten different measurement points, solid triangles 500,representing filtered data, show a bow of between approximately 10 and11 microns. The solid squares 510, representing noisy data, show a bowof between approximately 12 and 9.5 microns.

[0044] The present invention operates to eliminate or reduce noise fromnoisy data measurements. While the description has exemplified itsapplication to a wafer measurement system, it has application to otherflat structures such as memory disks.

[0045] Having described preferred embodiments of the invention it willnow become apparent to those of ordinary skill in the art that otherembodiments incorporating these concepts may be used. Accordingly, it issubmitted that the invention should not be limited by the describedembodiments but rather should only be limited by the spirit and scope ofthe appended claims.

I claim:
 1. A process for noise reduction from noisy data representingan artifact at sample points in two. dimensional space of a specimencomprising the steps of: receiving said noisy data as a vector, eachelement of which corresponds to one sample point; and calculatingcoefficients of a polynomial which converts said noisy data vector to atwo dimensional function continuously representing the artifact in thetwo dimensional space.
 2. The process of claim 1 wherein said samplepoints lack regular geometrically proscribed locations on said specimen.3. The process of claim 1 wherein said specimen is a non-rectilinearspecimen.
 4. The process of claim 1 wherein the sample points have asufficiency to represent the special frequency of the noise to bereduced.
 5. The process of claim 1 wherein said polynomial is a Zernikepolynomial.
 6. The process of claim 1 wherein said calculatedcoefficients are fewer in number than the number of sample points. 7.The process of claim 1 wherein said noisy data is obtained using ameasuring apparatus and wherein said calculating step includes the stepof mathematically multiplying said data vector by a matrix representinga least squares fit between said data vector and the polynomial.
 8. Theprocess of claim 7 wherein said matrix is a single value decompositionof said two dimensional space as applied to said apparatus.
 9. Theprocess of claims 1 further comprising the step of calculating specimenspatial artifacts from said polynomial for one or more points in saidtwo dimensional space.
 10. The process of claim 9 further comprising thestep of transmitting said coefficients to a remote location prior to thecalculation of spacial artifacts from said polynomial.
 11. A process forthe generating a noise correcting. matrix for a measurement apparatuscomprising: receiving data representative of artifacts in twodimensional space of a specimen obtained by said apparatus, each datapoint associated with a data position; and calculating aspecimen-independent noise compensating matrix as a function said dataposition in two dimensional space on said specimen.
 12. The process ofclaim 11 wherein said calculating step applies least squares fitanalysis.
 13. The process of claim 11 wherein said matrix is of the formof a multiplier of Zernike polynomial decomposition coefficients.
 14. Anapparatus for noise reduction from noisy data representing an artifactat sample points in two dimensional space of a specimen comprising:means for receiving said noisy data as a vector, each element of whichcorresponds to one sample point; and means for calculating coefficientsof a polynomial which converts said noisy data vector to a twodimensional function continuously representing the artifact in the twodimensional space.
 15. The apparatus of claim 14 wherein said specimenis a non-rectilinear specimen.
 16. The apparatus of claim 14 wherein thesample points have a sufficiency to represent the spacial frequency ofthe noise to be reduced.
 17. The apparatus of claim 14 wherein saidpolynomial is a Zernike polynomial.
 18. The apparatus of claim 14wherein said calculated coefficients are fewer in number than the numberof data points.
 19. The apparatus of claim 14 wherein said noisy data isobtained using a measuring apparatus and wherein said calculating meansincludes means for mathematically multiplying said data vector by amatrix representing a least squares fit between the data vector and thepolynomial.
 20. The apparatus of claim 19 wherein said matrix is asingle value decomposition of said two dimensional space as applied tosaid measuring apparatus.
 21. The apparatus of claim 14 furthercomprising means for calculating specimen spatial artifacts from saidpolynomial for one or more points in said two dimensional space.
 22. Theapparatus of claim 21 further comprising means for transmitting saidcoefficients to a remote location prior to the calculation of spatialartifacts from said polynomial.
 23. Apparatus for generating a noisecorrecting matrix for a measurement apparatus comprising: means forreceiving data representative of artifacts in two dimensional space of aspecimen obtained by said apparatus, each data point assocated with adata position; and means for calculating a specimen-independent noisecompensating matrix as a function of data position in two dimensionalspace on said specimen.
 24. The apparatus of claim 23 wherein saidcalculating means applies least squares fit analysis.
 25. The apparatusof claim 23 wherein said matrix is of the form of a multiplier of aZernike polynomial without decomposition coefficients.
 26. The apparatusof claim 14 wherein said means for calculating coefficients is acomputer.
 27. A model-based method of wafer shape reconstructioncomprising: obtaining a set of noisy data points representing the wafershape; using a complete set of Zernike polynomials as a shape functionalspace; applying a weighted least square fit between said noisy datapoints and a set of data points calculated from said Zernikepolynomials; and finding decomposition coefficients for said wafershape.
 28. The model-based method of claim 27 wherein said decompositioncoefficients are a compact wafer shape data representation.
 29. Themodel-based method of claim 27 wherein said set of noisy data pointsform a scanning pattern that is not necessarily evenly spaced.
 30. Theapparatus of claim 14, wherein said sample points lack regulargeometrically proscribed locations on said specimen.